Exploiting Language Variants Via Grammar Parsing Having Morphologically Rich Information
نویسنده
چکیده
In this paper, the development and evaluation of the Urdu parser is presented along with the comparison of existing resources for the language variants Urdu/Hindi. This parser was given a linguistically rich grammar extracted from a treebank. This context free grammar with sufficient encoded information is comparable with the state of the art parsing requirements for morphologically rich and closely related language variants Urdu/Hindi. The extended parsing model and the linguistically rich grammar together provide us promising parsing results for both the language variants. The parser gives 87% of f-score, which outperforms the multi-path shift-reduce parser for Urdu and a simple Hindi dependency parser with 4.8% and 22% increase in recall, respectively.
منابع مشابه
Morphologically rich Urdu grammar parsing using Earley algorithm
This work presents the development and evaluation of an extended Urdu parser. It further focuses on issues related to this parser and describes the changes made in the Earley algorithm to get accurate and relevant results from the Urdu parser. The parser makes use of a morphologically rich context free grammar extracted from a linguistically-rich Urdu treebank. This grammar with sufficient enco...
متن کاملKnowledge Sources for Constituent Parsing of German, a Morphologically Rich and Less-Configurational Language
We study constituent parsing of German, a morphologically rich and less-configurational language. We use a probabilistic context-free grammar treebank grammar that has been adapted to the morphologically rich properties of German by markovization and special features added to its productions. We evaluate the impact of adding lexical knowledge. Then we examine both monolingual and bilingual appr...
متن کاملVerbs are where all the action lies: Experiences of Shallow Parsing of a Morphologically Rich Language
Verb suffixes and verb complexes of morphologically rich languages carry a lot of information. We show that this information if harnessed for the task of shallow parsing can lead to dramatic improvements in accuracy for a morphologically rich languageMarathi1. The crux of the approach is to use a powerful morphological analyzer backed by a high coverage lexicon to generate rich features for a C...
متن کاملStatistical Parsing of Spanish and Data Driven Lemmatization
Although parsing performances have greatly improved in the last years, grammar inference from treebanks for morphologically rich languages, especially from small treebanks, is still a challenging task. In this paper we investigate how state-of-the-art parsing performances can be achieved on Spanish, a language with a rich verbal morphology, with a non-lexicalized parser trained on a treebank co...
متن کاملWord Segmentation, Unknown-word Resolution, and Morphological Agreement in a Hebrew Parsing System
We present a constituency parsing system for Modern Hebrew. The system is based on the PCFG-LA parsing method of Petrov et al. (2006), which is extended in various ways in order to accommodate the specificities of Hebrew as a morphologically rich language with a small treebank. We show that parsing performance can be enhanced by utilizing a language resource external to the treebank, specifical...
متن کامل